Vincent Clemson
Education
Background
In my blog, I show how to use open source tools to analyze geospatial data. Previously, I worked at Booz Allen where I conducted a study on NATO1 object detectors using commercial satellite imagery on my AI Edge Kit team. Prior to this, I worked at Peraton for 5 years on a GEOINT2 performance modeling team, where I analyzed billions of records of transactional data on geospatial imagery stored by the NGA3 to help optimize the NSG4 and benefit the US military. Here, I promoted open source software development best practices and data science tooling usage in Python & R. Additionally, I managed my team’s government network GitLab & corporate GitHub organizations, which grew from 0 to 300+ repositories over my tenure.
What I’m interested in doing
Open Source Projects
- Visualized natural disasters around the 🌍 using dynamic tiling of satellite imagery & Maxar’s Open Data Program
- Built Spatial Machine Learning Models using {mlr3} to run within GitHub Actions
- Rebuilt Tomas Beuzen’s Deep Learning with PyTorch & ported it to render w/ nbdev & Quarto
- Developed & Deployed analytic Dash & {shiny} web apps for NGA mission analysts to predict years of geospatial coverage of Earth Observation Satellite Systems
- Improved dev & data science workflows by teaching engineers to version control their code using Git
- Created Quarto, R & RStudio CLI utility shims to handle multiple Quarto/R installations
- Started developing an R package, {leaflet.super}, to visualize big geospatial data with Leaflet
- Built R language GitHub Actions to run R environments remotely & contributed back to GitHub
- My old team had massive amounts of imagery analyst user activity data, and we constantly found out what’s important to the analysts, so I built exploratory unsupervised clustering tools to help us do so
- Military wartime border region behavior is of interest within the geospatial intelligence domain, so I built tools for analyzing satellite image distance to border regions & for creating new geometric border regions
-
Quantifying size & types of collections (e.g different camera sensor modes) is critical in Earth observation satellite systems engineering, so I’ve built different analytical product mapping tools to help do so:
e.g. advanced Plotly map animations & interactive Leaflet htmlwidget heatmaps - I version control my MacOS dot-profile & config files for rapid dev/data-sci setup
Programming Skills
Below is a non-exhaustive high level list of the technologies that I’m working in.
Python, Conda/Mamba, Jupyter, nbdev, Sphynx, Cookiecutter, SQL, JavaScript, Bash, Zsh, tmux, VSCode, R, Quarto, R Markdown, GNU Make, asciinema, Leafmap, Google Earth Engine, QGIS, GDAL
Career Path
AI Engineer – Booz Allen Hamilton
- Conducted a statistical performance analysis for evaluating 3rd party geospatial computer vision algorithms 🛩🏘️
(e.g. aircraft & building detectors using Maxar satellite imagery, polars, {sf}, {terra}, & Quarto to document results) - Built interactive satellite imagery data lakes (e.g. using Leafmap, TiTiler, & STAC6)
- Built interactive statistical reports for JSOC7 on soldier biometric performance data w/ {flexdashboard}
Systems Engineer - Peraton
Data science on the NGA’s enterprise systems engineering contract
- Analyzed the metadata & transactions of all Geospatial Intelligence imagery products across the IC
- Wrangled large amounts of historical categorical, numerical, spatial, & temporal data using extremely efficient in/out of memory tools (e.g. data.table, Apache Arrow, & Parquet file data lakes)
- Prototyped, developed, & maintained modeling tools to conduct EDA on data to analyze patterns and trends (e.g. ggplot2, sf, Plotly, Matplotlib, Leaflet, Dash, Shiny, Docker, & Cloud Foundry)
- Performed spatial relational/geometric operations on datasets to enrich feature sets (e.g. border regions)
- Used reproducible computational mediums to conduct workflows (e.g. R Markdown, Jupyter notebook)
- Statistical analysis on the performance, sizing, & budgeting of NSG imagery & their driving relationships
(e.g. linear trend models, bandwidth models, human-in-the-loop supervised/unsupervised EDA ML workflows) - Statistical analysis & Orbital Mechanics analyses on the performance of an ABI8 satellite / ground sensor system
- Worked on a distributed team & operated in a cloud computing environment. Experience with building a cloud from the ground up, config management, & permissions (e.g. AWS, RStudio Server Pro, Unix/Linux, VPC)
Application Developer Intern - JP Morgan Chase
- Agile development team in JP’s Technology Analyst Program. Team of 6 interns built a full stack Java-Spring tool aggregating data for the planning & execution of the migration & decommissioning of legacy JPMC data center servers. Worked front & backend. Led role as Scrum Master.
Data Analytics Intern - IMG Learfield & Penn State Athletics
Season Ticket Holder Survey Analysis
- Performed Decision Tree Modelling in R for finding trends between customers and ticket sale renewals
- Mined customer survey data using NLP9 techniques & the NLTK10 in Python (e.g. tokenizers, collocations)
Machine Learning Skills
Spatial Cross-Validation Techniques, Discrete Event Simulation, Generalized Linear Models, Ensemble Models, Unsupervised Learning, Principal Components Analysis, Clustering Techniques,
Dimensionality Reduction (Feature Selection)
CNNs (Convolutional Neural Networks), GANs (Generative Adversarial Networks), Gradient Descent, Regularization, Decision Boundary, One-vs-All Multiclass Classification, Neural Networks, Vectorization,
Backpropagation and Advanced Optimization techniques
NATO - North Atlantic Treaty Organization ↩︎
GEOINT - Geospatial Intelligence ↩︎
NGA - National Geospatial-Intelligence Agency ↩︎
NSG - National System for Geospatial-Intelligence ↩︎
SSG - Static Site Generator ↩︎
STAC - SpatioTemporal Asset Catalogs ↩︎
JSOC - Joint Special Operations Command ↩︎
ABI - Activity Based Intelligence ↩︎
NLP - Natural Language Processing ↩︎
NLTK - Natural Language Toolkit ↩︎